Computer Vision Introduced

Jack Davis

2026-01-15

Applications in sports

Directly from https://viso.ai/applications/computer-vision-in-sports/

List of visual AI applications in sports

Object Detection

Object detection is the recognition of where an object of interest (e.g., players, ball/puck, nets, boundary lines) is on a 2D image, and a bounding box around them. This is an extension of the OCR (optical character recognition) that we talked about last term with the card valuation.

In sports analytics, we often don’t just detect someone as “person”, but as a certain player, usually using the number on the back of the uniform. Additional work is done in video processing so that detection of a particular player is continuous even when their number is not showing from the location of the camera. (e.g., using a player’s previously known position and velocity to infer that a person must be player X because nobody else could be at a certain location at a certain time.)

Object Detection

https://viso.ai/wp-content/uploads/2022/02/people-detection-768x432.png

Pose Estimation

Text and image from: https://www.baeldung.com/cs/pose-estimation

“The objective of Pose Estimation, a general problem in computer vision, is to identify the location and orientation of an item or human. In the case of human pose estimation, we typically accomplish this by estimating the locations of various key points like hands, heads, elbows, and so on. These key points in photos and videos are what our machine-learning models seek to track”

“In photos or videos, human pose estimation recognizes and categorizes the positions of human body components and joints. To represent and infer human body positions in 2D and 3D space, a model-based technique is typically used. One particular class of flexible objects includes people. Keypoints will be in different positions concerning others when we bend our arms or legs.”

“2D human pose estimation is estimating the 2D position or spatial placement of key points on the human body from visuals like photos and movies. It is simply the estimations of keypoint locations in 2D space concerning an image or video frame. For every key point, the model predicts an X and Y coordinate.”

Aside: Homography

Homography is the inference of how something on a 2D image maps to a 3D space. In most sports we have the advantage of having lines marking the dimensions of certain things on the ground. If we can use object detection on those lines, we can infer where the camera is and where it’s pointing. From there, we can infer where somebody is in 3D space.

See also: “Improving Robustness of Homography Estimation for Ice Rink Registration” by Jason Shang https://uwspace.uwaterloo.ca/items/95a2a33c-f681-4c5d-9982-e217a002e8be

Source: https://source.roboflow.com/dWImQnUpSGZKKsggZYU1tvb3g7m2/s9PP8EhtBZhpzvmEUMSs/original.jpg

Source: https://www.baeldung.com/wp-content/uploads/sites/4/2023/01/fig2.png

Back to Pose Estimation

Here’s a few examples of how those key points (e.g., recognized body parts and joints of a player) are inferred that…

…the leg bone’s connected to the hip bone!

Photo from: https://sigmoidal.ai/

Photo from: https://learnopencv.com/
(OpenCV is a C++ library, that has both a Python wrapper and an R wrapper)

Pose Estimation

In sports analytics, pose estimation can be used to identify what action is happening (e.gs., a bump, set, or spike in volleyball, or a slap shot, wrist shot, or pass in hockey).

It can also be used to identify baseball or cricket pitching/bowling approaches to determine what throw it is, or at what point the type of throw can be identified by a batter.

Image source: https://objectways.com/

Pose Estimation

It can be used to infer which way a player is facing, which can determine their probable range of vision and what part of the field they can control.

It can be used to infer the load being placed on a player biomechanically to help predict future injury risk.

It is used in most Padel analytics tools on the market today. (Source: https://pub.mdpi-res.com/sensors/sensors-21-03368/article_deploy/html/images/sensors-21-03368-g004.png )

Demonstration in Python

Here we are going to demonstrate object recognition for soccer games in Python.

Unfortunately, public support for computer vision tasks in R has stalled with RopenCV around 2017, so we’re going to use the ultralytics package in Python instead.

Source: https://docs.ultralytics.com/tasks/pose/#models

Install ultralytics first ( https://www.jetbrains.com/help/pycharm/installing-uninstalling-and-upgrading-packages.html#install-in-tool-window ). If you’re using PyCharm, look for “python packages” in the lower left.

YOLO is an image recognition model that has already been pretrained for our use.

from ultralytics import YOLO

Demonstration in Python

“The YOLO pose dataset format can be found in detail in the Dataset Guide. To convert your existing dataset from other formats (like COCO etc.) to YOLO format, please use the JSON2YOLO tool by Ultralytics.”

To train a model, first we can load one.

#model = YOLO("yolo11n-pose.yaml")  # build a new model from YAML
#model = YOLO("yolo11n-pose.pt")  # load a pretrained model (recommended for training)
model = YOLO("yolo11n-pose.yaml").load("yolo11n-pose.pt")  # build from YAML and transfer weights

Then, if we wish, we can train it further on a particular dataset. For example, here is the COCO dataset. https://cocodataset.org/#keypoints-2018 , from the COCO 2018 Keypoint Detection Task. (COCO stands for “Common Objects in COntext”)

Demonstration in Python

# Epoch training
results = model.train(data="coco8-pose.yaml", epochs=100, imgsz=640)

Demonstration in Python

Next, we can validate the model by getting its metrics.

# Get the metrics as an object
metrics = model.val()  # no arguments needed, dataset and settings remembered

This returns a bunch of floating point metrics, which are explained at https://docs.ultralytics.com/guides/yolo-performance-metrics/

Notably: “Average Precision (AP): AP computes the area under the precision-recall curve, providing a single value that encapsulates the model’s precision and recall performance.”

Demonstration in Python

and Mean Average Precision (mAP)

https://www.ultralytics.com/glossary/mean-average-precision-map

mAP at 50: This metric considers a prediction correct if it overlaps with the ground truth by at least 50%.”mAP at 50-95: Popularized by the COCO dataset, this is the modern gold standard. It averages the mAP calculated at steps of 0.05 from IoU 0.50 to 0.95. This rewards models that not only find the object but locate it with extreme pixel-level accuracy, a key feature of Ultralytics YOLO11.”

metrics.box.map  # map50-95

metrics.pose.map  # map50-95(P)

Demonstration in Python

Now to see the whole thing in action

# Load a model
model2 = YOLO("yolo11n.pt")  # pretrained YOLO11n model

# Run batched inference on a list of images
results3 = model2(["image1.jpg", "image2.jpg", "image3.jpg", "image4.jpg"])  # return a list of Results objects
Run batched inference on a list of images
0: 640x640 1 person, 1 sports ball, 34.8ms
1: 640x640 1 person, 34.8ms
2: 640x640 6 persons, 1 sports ball, 34.8ms
3: 640x640 1 person, 1 sports ball, 34.8ms
Speed: 2.4ms preprocess, 34.8ms inference, 0.3ms postprocess per image at shape (1, 3, 640, 640)

Demonstration in Python

Demonstration in Python

# Process results list
for result in results3:
    boxes = result.boxes  # Boxes object for bounding box outputs
    masks = result.masks  # Masks object for segmentation masks outputs
    keypoints = result.keypoints  # Keypoints object for pose outputs
    probs = result.probs  # Probs object for classification outputs
    obb = result.obb  # Oriented boxes object for OBB outputs
    result.show()  # display to screen
    result.save(filename="result.jpg")  # save to disk

Demonstration in Python